MSKIDS volumetric analysis

library(tidyverse)
library(broom)

# load the data
df <- read.csv('data/harmonized_data_PNC+MS.csv', stringsAsFactors = TRUE) %>% 
  mutate(MS = as.factor(MS)) %>% 
  mutate_at(vars(starts_with('X')), ~ . *0.001) %>% # convert mm^3 to cm^3
  rename(ICV = X702,
         Sex = sex, 
         Age = age) %>% 
  mutate(Sex = str_to_title(Sex))

# capture roi indices as vector for functions
ROI_INDICES <- df %>% 
  select(starts_with("X")) %>% 
  colnames() 

# load the ROI dictionary
dict_df <- readxl::read_excel('data/MUSE_ROI_Dict.xlsx') %>% 
  filter(ROI_NAME == 'ICV' | ROI_INDEX < 300) %>% 
  select(ROI_INDEX, TISSUE_SEG, ROI_NAME) %>% 
  mutate(ROI_INDEX = paste0('X', ROI_INDEX),
         TISSUE_SEG = recode(TISSUE_SEG, `GM+WM+VN+BS+cCSF` = 'ICV')) %>% 
  filter(ROI_INDEX %in% ROI_INDICES) %>% 
# exclude abbreviation in significant rois
  mutate(ROI_NAME = str_replace(ROI_NAME, "( AnG| Calc| FRP| FuG| IOG| MCgG| PrG| PO)", "") %>% 
           str_replace(" Inf ", " Inferior ") %>% 
           str_replace(" Vent$", " Ventricle ") %>% 
           str_replace(" Lat ", " Lateral ") %>% 
           str_to_title() %>% 
           str_replace("Wm", "WM")
         )
  

# to exclude all abbreviations from all rois (currently fails)
# roi_abbreviations = ' Ofug |Ocp | Mtg | Msfg | Mpog | Morg  | Mog | Mfc | Mcgg | Lorg | Lig | Itg | Iog | Gre | Fug | Frp | Fo | Ent | Cun | Co | CO | Acgg | ACgG | AIns | AOrG | AnG | Calc | FRP | FuG | IOG | MCgC | PO | Opifg | Orifg | Pcgg | Pcu | Phg | Pins | Pog |  Porg | Pp | Prg | Pt | Sca | Sfg | Smc | Smg | Sog | Spl | Stg | Tmp | Trifg | Ttg | Fo | '

This report examines the effect of Multiple Sclerosis (MS) on brain volumes in a pediatric sample. The data used in this analysis consists of a harmonized dataset of volumetric measurements of 145 brain regions (and intracranial volume) as well as covariates such as age, sex and MS status.

Our primary aim was to investigate the effect of having MS on the 145 brain regions. For our secondary aim, we sought to investigate the interaction between MS status and sex in the same brain regions.

The 145 brain regions of interest are listed below. Throughout the report we use the abbreviations ‘GM’, ‘WM’ and ‘VN’ for Gray Matter, White Matter and Ventricular regions respectively.

dict_df %>% 
  select(ROI_NAME, TISSUE_SEG)

Multiple linear regression models were performed to investigate the relationship between MS and brain volume measurements in these regions. The models were run separately for each brain region of interest (ROI) and the results were reported for the ROIs that showed significant effects after adjusting for multiple corrections using the false discovery rate method. The results of these models are accompanied by box plots to help interpret the findings. This analysis shows that MS is associated with specific brain volume differences in a pediatric population.

Sample size

Total sample size: 1353.

Sample size per MS status and sex:

# print cell sizes and total sample size
df %>% 
  mutate(MS=fct_recode(MS, Yes = "1", No = "0")) %>% 
  count(Sex, MS)

Models

Primary aim

For our primary analysis, multiple regression models were run to investigate the relationship between MS and brain volumes. The models were run separately (i.e. in parallel) for each brain region of interest (ROI) and the results are reported for ROIs that showed significant associations with MS. The models controlled for Sex, Age, Age Squared, Age by Sex interaction, and Intracranial Volume (ICV).

Model equation: \[\begin{align} Volume &= \beta_0 + \beta_1*Sex + \beta_2*Age + \beta_3*Age^2 + \beta_4*Sex*Age + \beta_5*ICV + \beta_6*MS + \epsilon \end{align}\]

Significant main effects: MS

With non-MS controls as the reference group, the main effect in milliliters (\(mL\)) of having MS, after accounting for control variables, is shown in the estimate column for each significant ROI in the table below; 95% confidence intervals for the effect are shown in the confidence.interval column. Individual effects are described below the table.

#####
# Main effect Model: ROI ~ Sex + Age + Age^2 + Sex\*Age + MS + ICV
#####
run_model <- function(outcome){
  #' run model for each ROI
  f <- as.formula(sprintf("%s ~ Sex + Age + Age^2 + Sex*Age + MS + ICV", outcome))
  res <- lm(f, data = df) 
}

# run models searately
models <- ROI_INDICES %>% 
  purrr::map(run_model) 

# tidy models
models_df <- models %>% 
  purrr::map(tidy) %>% 
  setNames(ROI_INDICES) %>% 
  bind_rows(.id = 'ROI_INDEX') %>%
  left_join(dict_df, by = c('ROI_INDEX')) %>% # join with ROI dictionary
  select(ROI_INDEX, ROI_NAME, TISSUE_SEG, everything())
# use FDR correction
sig_main_effects_df <- models_df %>% 
  filter(term == 'MS1') %>% 
  mutate(p.value.adj = p.adjust(p.value, method='fdr')) %>% 
  # add confidence intervals:
  add_column(confidence.interval = purrr::map(models, confint, parm = "MS1") %>% purrr::map(~sprintf("(%.3f, %.3f)", .x[1], .x[2])) |> unlist(),
             .after = 'estimate') %>% 
  filter(p.value.adj < 0.05) %>% 
  mutate(p.value = ifelse(p.value < 0.0001, 
                          "< 0.0001",
                          as.character(round(p.value, 4))),
         p.value.adj = ifelse(p.value.adj < 0.0001, 
                          "< 0.0001",
                          as.character(round(p.value.adj, 4))),
         estimate = round(estimate, 3)) %>%
  arrange(by = TISSUE_SEG)
# show pretty
sig_main_effects_df %>% 
  select(-ROI_INDEX, -std.error, -statistic, -term)
# function to describe results
describe_main_effect <- function(row){
  sprintf("The %s volume of MS patient is %.3f mL %s, on average, compared to that of non-MS controls (95%% CI: %s, p %s).",
         row$ROI_NAME, 
         round(abs(row$estimate), 3), 
         ifelse(sign(row$estimate) == 1, 'bigger', 'smaller'), 
         row$confidence.interval,
         ifelse(str_detect(row$p.value.adj, "<"), row$p.value.adj, sprintf('= %s', row$p.value.adj))
        ) |> invisible()
}

# split by rows
rows <- sig_main_effects_df %>% 
  split(seq_len(nrow(sig_main_effects_df)))

for (effect_row in rows){ # make list output
  cat("\n")
  cat("-", describe_main_effect(effect_row), "\n")
}
  • The Right Thalamus Proper volume of MS patient is 0.446 mL smaller, on average, compared to that of non-MS controls (95% CI: (-0.561, -0.331), p < 0.0001).

  • The Left Thalamus Proper volume of MS patient is 0.446 mL smaller, on average, compared to that of non-MS controls (95% CI: (-0.564, -0.328), p < 0.0001).

  • The Left Angular Gyrus volume of MS patient is 0.575 mL smaller, on average, compared to that of non-MS controls (95% CI: (-0.923, -0.226), p = 0.0179).

  • The Left Calcarine Cortex volume of MS patient is 0.243 mL bigger, on average, compared to that of non-MS controls (95% CI: (0.091, 0.395), p = 0.0216).

  • The Right Frontal Pole volume of MS patient is 0.179 mL smaller, on average, compared to that of non-MS controls (95% CI: (-0.285, -0.073), p = 0.0167).

  • The Left Frontal Pole volume of MS patient is 0.214 mL smaller, on average, compared to that of non-MS controls (95% CI: (-0.337, -0.091), p = 0.0138).

  • The Right Fusiform Gyrus volume of MS patient is 0.358 mL smaller, on average, compared to that of non-MS controls (95% CI: (-0.597, -0.119), p = 0.0324).

  • The Right Inferior Occipital Gyrus volume of MS patient is 0.299 mL bigger, on average, compared to that of non-MS controls (95% CI: (0.093, 0.506), p = 0.0415).

  • The Left Middle Cingulate Gyrus volume of MS patient is 0.306 mL smaller, on average, compared to that of non-MS controls (95% CI: (-0.464, -0.149), p = 0.0042).

  • The Left Parietal Operculum volume of MS patient is 0.181 mL smaller, on average, compared to that of non-MS controls (95% CI: (-0.298, -0.064), p = 0.0278).

  • The Right Precentral Gyrus volume of MS patient is 0.532 mL smaller, on average, compared to that of non-MS controls (95% CI: (-0.863, -0.200), p = 0.0216).

  • The 3rd Ventricle volume of MS patient is 0.175 mL bigger, on average, compared to that of non-MS controls (95% CI: (0.123, 0.228), p < 0.0001).

  • The Right Inferior Lateral Ventricle volume of MS patient is 0.069 mL bigger, on average, compared to that of non-MS controls (95% CI: (0.040, 0.097), p < 0.0001).

  • The Left Inferior Lateral Ventricle volume of MS patient is 0.041 mL bigger, on average, compared to that of non-MS controls (95% CI: (0.018, 0.064), p = 0.0132).

  • The Left Lateral Ventricle volume of MS patient is 1.174 mL bigger, on average, compared to that of non-MS controls (95% CI: (0.399, 1.950), p = 0.0314).

  • The Frontal Lobe WM Left volume of MS patient is 1.979 mL bigger, on average, compared to that of non-MS controls (95% CI: (0.798, 3.160), p = 0.0167).

  • The Corpus Callosum volume of MS patient is 0.458 mL smaller, on average, compared to that of non-MS controls (95% CI: (-0.781, -0.135), p = 0.0472).

Secondary aim

For our secondary analysis, we ran models that included an interaction term of MS by sex.

Model equation:

\[\begin{align} Volume &= \beta_0 + \beta_1*Sex + \beta_2*Age + \beta_3*Age^2 + \beta_4*Sex*Age + \beta_5*ICV + \beta_6*MS*Sex + \epsilon \end{align}\]

Significant interaction effects: MS by Sex

With MS females as the reference group, the interaction effect in milliliters (\(mL\)) of having MS and being male is shown in the estimate column for each significant ROI. These effects are independent of the main effect of sex.

#####
# Interaction effect Model: ROI ~ Sex + Age + Age^2 + Sex\*Age + MS*Sex + ICV----
#####
run_model <- function(outcome){
   #' run model for one ROI
  f <- as.formula(sprintf("%s ~ Sex + Age + Age^2 + Sex*Age + MS*Sex + ICV", outcome))
  res <- lm(f, data = df)
}

# run models separately for each ROI
models <- ROI_INDICES %>% 
  purrr::map(run_model) 

# tidy models
models_df <- models %>% 
  purrr::map(tidy) %>% 
  setNames(ROI_INDICES) %>% 
  bind_rows(.id = 'ROI_INDEX') %>% 
  left_join(dict_df, by = c('ROI_INDEX')) %>% 
  select(ROI_INDEX, ROI_NAME, TISSUE_SEG, everything())

# select effects/terms of interest and adjust p values. keep significant effects
sig_interaction_effects_df <- models_df %>% 
  filter(term %in% c('SexMale:MS1')) %>% 
  mutate(p.value.adj = p.adjust(p.value, method='fdr')) %>% 
  add_column(confidence.interval = purrr::map(models, confint, parm = 'SexMale:MS1') %>% purrr::map(~sprintf("(%.3f, %.3f)", .x[1], .x[2])) |> unlist(),
             .after = 'estimate') %>% 
  filter(p.value.adj < 0.05) %>% 
  mutate(p.value = ifelse(p.value < 0.0001, 
                          "< 0.0001",
                          as.character(round(p.value, 4))),
         p.value.adj = ifelse(p.value.adj < 0.001, 
                          "< 0.0001",
                          as.character(round(p.value.adj, 4))),
         estimate = round(estimate, 3)) %>% 
  arrange(by = TISSUE_SEG)

# show results
sig_interaction_effects_df %>% 
  filter(term == 'SexMale:MS1') %>% 
  select(-ROI_INDEX, -std.error, -statistic, -term)
# describe each result
describe_interaction_effect <- function(row){
  sprintf("The %s volume of MS males is %.3f mL %s, on average, compared to that of MS females after accounting for the main effect of MS (95%% CI: %s, p %s).",
         row$ROI_NAME, 
         round(abs(row$estimate), 3), 
         ifelse(sign(row$estimate) == 1, 'bigger', 'smaller'), 
         row$confidence.interval,
         ifelse(str_detect(row$p.value.adj, "<"), row$p.value.adj, sprintf('= %s', row$p.value.adj))
        ) |> invisible()
}

# split by rows
rows <- sig_interaction_effects_df %>% 
  split(seq_len(nrow(sig_interaction_effects_df)))

for (effect_row in rows){ # make list output
  cat("\n")
  cat("-", describe_interaction_effect(effect_row), "\n")
}
  • The Right Thalamus Proper volume of MS males is 0.812 mL bigger, on average, compared to that of MS females after accounting for the main effect of MS (95% CI: (0.554, 1.071), p < 0.0001).

  • The Left Thalamus Proper volume of MS males is 0.893 mL bigger, on average, compared to that of MS females after accounting for the main effect of MS (95% CI: (0.629, 1.157), p < 0.0001).

  • The Right Inferior Lateral Ventricle volume of MS males is 0.121 mL bigger, on average, compared to that of MS females after accounting for the main effect of MS (95% CI: (0.057, 0.185), p = 0.0096).

Visualization

The means for the ROIs with a significant main effect and interaction are plotted below.

get_roi_name <- function(roi) dict_df %>% 
    filter(ROI_INDEX == roi) %>%
    pull(ROI_NAME)

plot_means_main_effect <- function(roi, data=df){
  #' make a box plot for a given roi
  roi_name <- get_roi_name(roi)
  
  data %>% 
    mutate(MS=fct_recode(MS, Yes = "1", No = "0")) %>% 
    ggplot(aes_string(y=roi, x='MS')) + 
    geom_boxplot(fill='purple') + 
    ylab(sprintf("%s (mL)", roi_name)) +
    theme_bw()

}

## make a boxplot for ROIs with significant main effects
 sig_rois_main_effect <- sig_main_effects_df %>% 
  pull(ROI_INDEX) 
 
sig_rois_main_effect_plots <- sig_rois_main_effect %>% 
  purrr::map(~ plot_means_main_effect(.x, data=df)) %>% 
  setNames(sig_rois_main_effect)

Main Effect

We plot ROI volumes to visualize the main effect of MS. The “tabs” can be used to toggle the different plots.

ROI Volumes

for (sig_roi in sig_rois_main_effect){
  cat("##### ",get_roi_name(sig_roi),"\n")
  print(sig_rois_main_effect_plots[[sig_roi]])
  cat('\n\n')
}
Right Thalamus Proper

Left Thalamus Proper

Left Angular Gyrus

Left Calcarine Cortex

Right Frontal Pole

Left Frontal Pole

Right Fusiform Gyrus

Right Inferior Occipital Gyrus

Left Middle Cingulate Gyrus

Left Parietal Operculum

Right Precentral Gyrus

3rd Ventricle

Right Inferior Lateral Ventricle

Left Inferior Lateral Ventricle

Left Lateral Ventricle

Frontal Lobe WM Left

Corpus Callosum

Interaction Effect

ROI volumes minus Age and ICV effect

To better visualize the effect of the MS and Sex interaction, we regress out the effect of Age, Age Squared and ICV on the significant ROIs using the following model:

\[\begin{align} Volume &= \beta_0 + \beta_1*Age + \beta_2*Age^2 + \beta_3*ICV + \epsilon \end{align}\]

In other words, we fit this model and plot the resulting residuals.

plot_means_interaction_effect <- function(roi, data=df){
  #' make a box plot for a given roi
  roi_name <- get_roi_name(roi)
  
  data %>% 
    mutate(MS=fct_recode(MS, Yes = "1", No = "0")) %>% 
    ggplot(aes_string(y=roi, x='MS', fill='Sex')) + 
    geom_boxplot() + 
    ylab(sprintf("Residual of %s  Volume (mL)", roi_name)) +
    theme_bw() 

    
}
# regress out nuisance effects from roi
reg_out_roi <- function(roi){
  #' run model for each ROI
  model <- lm(roi ~ Age + Age^2 + ICV, data = df) 
  model$residuals
}

reg_out_df <- df %>% 
   mutate_at(vars(starts_with('X')), reg_out_roi)

## make a boxplot for ROIs with significant interaction effects
sig_rois_interaction_effect <- sig_interaction_effects_df %>% 
  pull(ROI_INDEX) 

interaction_effect_plots <- sig_rois_interaction_effect %>% 
  purrr::map(~ plot_means_interaction_effect(.x, data=reg_out_df)) %>% 
  setNames(sig_rois_interaction_effect)
Right Thalamus Proper

Left Thalamus Proper

Right Inferior Lateral Ventricle